Experiments on Parallel Training of Deep Neural Network using Model Averaging
نویسندگان
چکیده
In this work we apply model averaging to parallel training of deep neural network (DNN). Parallelization is done in a model averaging manner. Data is partitioned and distributed to different nodes for local model updates, and model averaging across nodes is done every few minibatches. We use multiple GPUs for data parallelization, and Message Passing Interface (MPI) for communication between nodes, which allows us to perform model averaging frequently without losing much time on communication. We investigate the effectiveness of Natural Gradient Stochastic Gradient Descent (NG-SGD) and Restricted Boltzmann Machine (RBM) pretraining for parallel training in modelaveraging framework, and explore the best setups in term of different learning rate schedules, averaging frequencies and minibatch sizes. It is shown that NG-SGD and RBM pretraining benefits parameter-averaging based model training. On the 300h Switchboard dataset, a 9.3 times speedup is achieved using 16 GPUs and 17 times speedup using 32 GPUs with limited decoding accuracy loss. 1
منابع مشابه
Speech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملEmpirical Evaluation of Parallel Training Algorithms on Acoustic Modeling
Deep learning models (DLMs) are state-of-the-art techniques in speech recognition. However, training good DLMs can be time consuming especially for production-size models and corpora. Although several parallel training algorithms have been proposed to improve training efficiency, there is no clear guidance on which one to choose for the task in hand due to lack of systematic and fair comparison...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملUAV attitude Sensor Fault Detection Based On Fuzzy Logic and by Neural Network Model Identification
Fault detection has always been important in aviation systems to prevent many accidents. This process is possible in different ways. In this paper, we first identify the longitudinal axis plane model using neural network approach. Then based on the obtained model and using fuzzy logic, the aircraft status sensor fault detection unit was designed. The simulation results show that the fault detec...
متن کاملNon-melanoma skin cancer diagnosis with a convolutional neural network
Background: The most common types of non-melanoma skin cancer are basal cell carcinoma (BCC), and squamous cell carcinoma (SCC). AKIEC -Actinic keratoses (Solar keratoses) and intraepithelial carcinoma (Bowen’s disease)- are common non-invasive precursors of SCC, which may progress to invasive SCC, if left untreated. Due to the importance of early detection in cancer treatment, this study aimed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1507.01239 شماره
صفحات -
تاریخ انتشار 2015